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Abstract 


This document describes the RTP payload format for the BroadVoice (R) 
narrowband and wideband speech codecs. The narrowband codec, called 
BroadVoicel6, or BV16, has been selected by CableLabs as a mandatory 
codec in PacketCable 1.5 and has a CableLabs specification. The 
document also provides specifications for the use of BroadVoice with 
MIME and the Session Description Protocol (SDP). 
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1. Introduction 


This document specifies the payload format for sending BroadVoice 
encoded speech or audio signals using the Real-time Transport 


Protocol (RTP) 


[1]. The sender may send one or more BroadVoice codec 


data frames per packet, depending on the application scenario, based 
on network conditions, bandwidth availability, delay requirements, 
and packet-loss tolerance. 


The key words 


"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 


"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
document are to be interpreted as described in RFC 2119 [2]. 


2. Background 


BroadVoice is a speech codec family developed for VoIP (Voice over 
Internet Protocol) applications, including Voice over Cable, Voice 
over DSL, and IP phone applications.  BroadVoice achieves high speech 
quality with a low coding delay and relatively low codec complexity. 


The BroadVoice codec family contains two codec versions. The 
narrowband version of BroadVoice, called BroadVoicel6 [3], or BV16 
for short, encodes 8 kHz-sampled narrowband speech at a bit rate of 
16 kilobits/second, or 16 kbit/s. The wideband version of 
BroadVoice, called BroadVoice32, or BV32, encodes 16 kHz-sampled 
wideband speech at a bit rate of 32 kbit/s. The BV16 and BV32 use 
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very Similar (but not identical) coding algorithms; they share most 
of their algorithm modules. 


To minimize the delay in real-time two-way communications, both the 
BV16 and BV32 encode speech with a very small frame size of 5 ms 
without using any look ahead. By using a packet size as small as 5 
ms if necessary, this allows VoIP systems based on BroadVoice to have 
a very low end-to-end system delay. 


BroadVoice also has relatively low codec complexity when compared 
with ITU-T standard speech codecs based on CELP (Coded Excited Linear 
Prediction), such as G.728, G.729, G.723.1, and G.722.2. Full-duplex 
implementations of the BV16 and BV32 take around 12 and 17 MIPS, 
respectively, on general-purpose 16-bit fixed-point digital signal 
processors (DSPs). The total memory footprints of the BV16 and BV32, 
including program size, data tables, and data RAM, are around 12 
kwords each, or 24 kbytes. 


The PacketCable(TM) project of Cable Television Laboratories, Inc. 
(CableLabs(R)) has chosen the BV16 codec for use in VoIP telephone 
services provided by cable operators. More specifically, the BV16 
codec was selected as one of the mandatory audio codecs in the 
PacketCable(TM) 1.5 Audio/Video Codecs Specification [8] and has been 
implemented by multiple vendors. The wideband version (BV32) has 
been developed by Broadcom but has not yet appeared in a public 
specification; since it is technically very similar to BV16, its 
payload format is also defined in this document. 


3. RTP Payload Format for BroadVoicel6 Narrowband Codec 


The BroadVoicel6 uses 5 ms frames and a sampling frequency of 8 kHz, 
so the RTP timestamp MUST be in units of 1/8000 of a second. The RTP 
timestamp indicates the sampling instant of the oldest audio sample 
represented by the frame(s) present in the payload. The RTP payload 
for the BroadVoicel6 has the format shown in the figure below. No 
additional header specific to this payload format is required. 


0 Í 2 3 
O22 2-394 5.6 7 8 9.0-1.2-3 4 5-6 7.8 9.0 1-2 3 AE 7 8 9».0: 1 
—R————.—.-—-—-—R-—R—R—R-—R-—R—R—R— RR R—R—R—————-—.-.-t-t-t-t 
| RTP Header [1] | 
FStStStetataetaetatatatatatatatatatatatatatatatatatatatatatatetetet 
| one or more frames of BroadVoicel6 

+ 


-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
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If BroadVoicel6 is used for applications with silence compression, 
the first BroadVoicel6 packet after a silence period during which 
packets have not been transmitted contiguously SHOULD have the marker 


bit in the RTP data header set to one. 
packets is zero. 
the marker bit to zero. 


The marker bit in all other 


Applications without silence suppression MUST set 


The assignment of an RTP payload type for this new packet format is 


outside the scope of this document, and 
It is expected that the RTP profile for 
applications will assign a payload type 
is not done, then a payload type in the 
chosen. 


3.1. BroadVoicel6 Bit Stream Definition 


will not be specified here. 

a particular class of 

for this encoding, or if that 
dynamic range shall be 


The BroadVoicel6 encoder operates on speech frames of 5 ms 
corresponding to 40 samples at a sampling rate of 8000 samples per 


second. For every 5 ms frame, 
audio samples into 80 bits, 
stream produced by the BroadVoicel6 for 
aligned, 


and no padding bits are required. 


the encoder encodes the 40 consecutive 
or 10 octets. 


Thus, the 80-bit bit 
each 5 ms frame is octet- 
The bit allocation for 


the encoded parameters of the BroadVoicel6 codec is listed in the 


following table. 


Encoded Parameter Codeword 
Line Spectrum Pairs LO,L1 
Pitch Lag PL 

Pitch Gain PG 
Log-Gain LG 
Excitation Vectors VO, iV9 


Number of bits per frame 


The mapping of the encoded parameters in an 80-bit BroadVoicel6 data 


frame is defined in the following figure. 


packing in "network byte order", 
bits of each 32-bit word are numbered 0 


significant bit on the left and numbered O0. 


each word are transmitted with the most 


This figure shows the bit 


also known as big-endian order. The 
to 31, with the most 
The octets (bytes) of 
significant octet first. The 


bits of the data field for each encoded parameter are numbered in the 


same order, 
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0 Í 2 3 
0:1.2-34 5.6 7 8 9: 0.1.2. 3 4 5 6 7.8 90/012 3 4-56 7 8 9.0 I 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| LO | L1 | PL | PG | rc | vo| 


|0123456|0123456|0123456ļ|01234|01 2 30 1| 
tata tata tata ta tata tata tata ta tata ta tata tata ta tata ta ta tata HH 


| vo | v1 | v2 | v3 | v4 | v5 | v6 
2:;374|0 1.2.3 4]0 1 2 3 A\0 1 2 8-A[0 1.2 3 4[0 14-2 S40 1.2 3 

tata tata tarda tata tata tarda tata tata tata --t-t-t-t-t-t-t-t-t-t-t-t- 

|v | v7 | vs | v9 | 

| 6 | 


|[4|9:372-3 410. 2S Bra) Oo 2. Se 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


Figure 1: BroadVoicel6 bit packing 
3.2. Multiple BroadVoicel6 Frames in an RTP Packet 


More than one BroadVoicel6 frame MAY be included in a single RTP 
packet by a sender. Senders have the following additional 
restrictions: 


o SHOULD NOT include more BroadVoicel6 frames in a single RIP 
packet than will fit in the MTU of the RTP. 


o MUST NOT split a BroadVoicel6 frame between RTP packets. 
o BroadVoicel6 frames in an RTP packet MUST be consecutive. 


Since multiple BroadVoicel6 frames in an RIP packet MUST be 
consecutive, and since BroadVoicel6 has a fixed frame size of 5 ms, 
recovering the timestamps of all frames within a packet is easy. The 
oldest frame within an RTP packet has the same timestamp as the RTP 
packet, as mentioned above. To obtain the timestamp of the frame 
that is N frames later than the oldest frame in the packet, one 
simply adds 5*N ms worth of time units to the timestamp of the RTP 
packet. 


It is RECOMMENDED that the number of frames contained within an RTP 
packet be consistent with the application. For example, in a 
telephony application where delay is important, the fewer frames per 
packet, the lower the delay; whereas, for a delay insensitive 
streaming or messaging application, many frames per packet would be 
acceptable. 
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Information describing the number of frames contained in an RTP 
packet is not transmitted as part of the RTP payload. The only way 
to determine the number of BroadVoicel6 frames is to count the total 
number of octets within the RTP payload, and divide the octet count 
by 10. 


4. RTP Payload Format for BroadVoice32 Wideband Codec 


The BroadVoice32 uses 5 ms frames and a sampling frequency of 16 kHz, 
so the RTP timestamp MUST be in units of 1/16000 of a second. The 
RTP timestamp indicates the sampling instant of the oldest audio 
sample represented by the frame(s) present in the payload. The RTP 
payload for the BroadVoice32 has the format shown in the figure 
below. No additional header specific to this payload format is 
required. 


0 1 2 3 

0. lL.:2- 3.4 5.6 7 8:90 1.2-34 5.6 7 8/9 0 l1.2 3-4 5 6-7 8.9 0. l 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| RTP Header [1] | 
mp pM 


one or more frames of BroadVoice32 


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


If BroadVoice32 is used for applications with silence compression, 
the first BroadVoice32 packet after a silence period during which 
packets have not been transmitted contiguously SHOULD have the marker 
bit in the RTP data header set to one. The marker bit in all other 
packets is zero. Applications without silence suppression MUST set 
the marker bit to zero. 


The assignment of an RTP payload type for this new packet format is 
outside the scope of this document, and will not be specified here. 
It is expected that the RIP profile for a particular class of 
applications will assign a payload type for this encoding, or if that 
is not done, then a payload type in the dynamic range shall be 
chosen. 


4.1. BroadVoice32 Bit Stream Definition 


The BroadVoice32 encoder operates on speech frames of 5 ms 
corresponding to 80 samples at a sampling rate of 16000 samples per 
second. For every 5 ms frame, the encoder encodes the 80 consecutive 
audio samples into 160 bits, or 20 octets. Thus, the 160-bit bit 
stream produced by the BroadVoice32 for each 5 ms frame is octet- 
aligned, and no padding bits are required. The bit allocation for 
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the encoded parameters of the BroadVoice32 codec is listed in the 


following table. 


Number of bits 


Encoded Parameter Codeword per frame 
Line Spectrum Pairs LO,L1,L2 7+5+5=17 
Pitch Lag PL 8 
Pitch Gain PG 5 
Log-Gains (lst & 2nd subframes) LGO, LG1 5+5=10 
Excitation Vectors (1st subframe) VAO,...,VA9 6*10=60 
Excitation Vectors (2nd subframe) VBO,...,VB9 6*10=60 
Total 160 bits 


The mapping of the encoded parameters in a 160-bit BroadVoice32 data 


frame is defined in the following figure. 
packing in "network byte order", 


This figure shows the bit 
also known as big-endian order. The 
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bits of each 32-bit word are numbered 0 to 31, with the most 
significant bit on the left and numbered 0. The octets (bytes) of 
each word are transmitted with the most significant octet first. 


bits of the data field for each encoded parameter are numbered in the 


same order, with the most significant bit on the left. 


0 1 2 3 

0 do lx45-6 T8 90.102034 5 6 718 9 Q L:22.2-4 5-6 78:90 f 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| LO | L1 | 52] PL | PG  |LGO| 


DUIJ273-4 5 610 1203-4 [0-232 SAO 2S A 5546 010 2 587 A Ot 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 160 | LGl | VAO | VA1 | VA2 | VA3 | 


|2 3 4|01234|012345|[012345|[012345|01234 5| 
d—R————t.—.-—-—-—R—R—R——R—R—R— RM BMBMBMBMMÁBMÁBMÁB—MÁBARÓÓMÓÓMÁÓÀÁMÁÀ 
| VA4 | VA5 | VA6 | VAT | VA8 per 
|o12345|012345|012345|012345|01234 5| GE] 
—R———c—t.—.—-—-—R—R—R—R-—R-—R—R—R— RM RM B—R—————.-—.-t-t-t-t-t 
| va9 | VBO | VB1 | VB2 | VB3 | vB4 | 


253-49 0 acs vai A 0 deos ed WO e acd epus eo dS ea au 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| vB4 | VB5 | VB6 | VB7 | VB8 | | vB9 | 


[4^810:21.:223 241 5/00 aS OF 21-72 2949 [01:2 BA 5 Oh Ie a 3 5| 
t—+-+-4+-+-4+-4+-4+-4+-4+-4+-4+-4+-4-4-4+-4t-4-4-4t-4+-4-4+-4+-4-4-4+-4t-4-4+-4-4-4 


Figure 2: BroadVoice32 bit packing 
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4.2. Multiple BroadVoice32 Frames in an RTP Packet 
More than one BroadVoice32 frame MAY be included in a single RTP 
packet by a sender. Senders have the following additional 


restrictions: 


o SHOULD NOT include more BroadVoice32 frames in a single RIP 
packet than will fit in the MTU of the RTP. 


o MUST NOT split a BroadVoice32 frame between RTP packets. 


o BroadVoice32 frames in an RTP packet MUST be consecutive. 


Since multiple BroadVoice32 frames in an RTP packet MUST be 
consecutive, and since BroadVoice32 has a fixed frame size of 5 ms, 
recovering the timestamps of all frames within a packet is easy. The 
oldest frame within an RTP packet has the same timestamp as the RTP 
packet, as mentioned above. To obtain the timestamp of the frame 
that is N frames later than the oldest frame in the packet, one 
simply adds 5*N ms worth of time units to the timestamp of the RTP 
packet. 


It is RECOMMENDED that the number of frames contained within an RTP 
packet be consistent with the application. For example, in a 
telephony application where delay is important, the fewer frames per 
packet, the lower the delay; whereas, for a delay insensitive 
streaming or messaging application, many frames per packet would be 
acceptable. 


Information describing the number of frames contained in an RTP 
packet is not transmitted as part of the RTP payload. The only way 
to determine the number of BroadVoice32 frames is to count the total 
number of octets within the RTP payload, and divide the octet count 
by 20. 


5. IANA Considerations 


Two new MIME sub-types, as described in this section, have been 
registered. 


The MIME names for the BV16 and BV32 codecs have been allocated from 
the IETF tree since these two codecs are expected to be widely used 
for Voice-over-IP applications, especially in Voice over Cable 
applications. 
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5.1. MIME Registration of BroadVoicel6 for RTP 
MIME media type name: audio 
MIME media subtype name: BV16 
Required parameter: none 


Optional parameters: 
ptime: Defined as usual for RIP audio (see RFC 2327 [4]). 


maxptime: See RFC 3267 [7] for its definition. The maxptime 
SHOULD be a multiple of the duration of a single codec data 
frame (5 ms). 


Encoding considerations: 
This type is defined for transferring BV16-encoded data via RTP 
using the payload format specified in Section 3 of RFC 4298. 
Audio data is binary data and must be encoded for non-binary 
transport; the Base64 encoding is suitable for Email. 


Security considerations: 
See Section 7 "Security Considerations" of RFC 4298. 


Public specification: 
The BroadVoicel6 codec has been specified in [3]. 


Intended usage: 
COMMON. It is expected that many VoIP applications, especially 
Voice over Cable applications, will use this type. 

Person & email address to contact for further information: 
Juin-Hwey (Raymond) Chen 
rchen@broadcom.com 

Author/Change controller: 
Author: Juin-Hwey (Raymond) Chen, rchen@broadcom.com 
Change Controller: IETF Audio/Video Transport Working Group 

delegated from the IESG 
5.2. MIME Registration of BroadVoice32 for RIP 
MIME media type name: audio 


MIME media subtype name: BV32 


Required parameter: none 
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Optional parameters: 
ptime: Defined as usual for RTP audio (see RFC 2327 [4]). 


maxptime: See RFC 3267 [7] for its definition. The maxptime 
SHOULD be a multiple of the duration of a single codec data 
frame (5 ms). 


Encoding considerations: 
This type is defined for transferring BV32-encoded data via RTP 
using the payload format specified in Section 4 of RFC 4298. 
Audio data is binary data and must be encoded for non-binary 
transport; the Base64 encoding is suitable for Email. 


Security considerations: 
See Section 7 "Security Considerations" of RFC 4298. 


Intended usage: 
COMMON. It is expected that many VoIP applications, especially 
Voice over Cable applications, will use this type. 


Person & email address to contact for further information: 
Juin-Hwey (Raymond) Chen 
rchen@broadcom.com 


Author/Change controller: 
Author: Juin-Hwey (Raymond) Chen, rchen@broadcom.com 
Change Controller: IETF Audio/Video Transport Working Group 
delegated from the IESG 


6. Mapping to SDP Parameters 


The information carried in the MIME media type specification has a 
specific mapping to fields in the Session Description Protocol (SDP) 
[4], which is commonly used to describe RIP sessions. When SDP is 
used to specify sessions employing the BroadVoicel6 or BroadVoice32 
codec, the mapping is as follows: 


- The MIME type ("audio") goes in SDP "m-" as the media name. 
- The MIME subtype (payload format name) goes in SDP "a-rtpmap" 
as the encoding name. The RTP clock rate in "a-rtpmap" MUST be 


8000 for BV16 and 16000 for BV32. 


- The parameters "ptime" and "maxptime" go in the SDP "a-ptime" 
and "a=maxptime" attributes, respectively. 
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An example of the media representation in SDP for describing BV16 
might be: 


m=audio 49120 RTP/AVP 97 
a=rtpmap:97 BV16/8000 


An example of the media representation in SDP for describing BV32 
might be: 


m=audio 49122 RTP/AVP 99 
a=rtpmap:99 BV32/16000 


6.1. Offer-Answer Model Considerations 


No special considerations are needed for using the SDP Offer/Answer 
model [5] with the BV16 and BV32 RIP payload formats. 


7. Security Considerations 


RTP packets using the payload format defined in this specification 
are subject to the security considerations discussed in the RTP 
specification [1] and any appropriate profile (for example, [6]). 
This implies that confidentiality of the media streams is achieved by 
encryption. 


A potential denial-of-service threat exists for data encoding using 
compression techniques that have non-uniform receiver-end 
computational load. The attacker can inject pathological datagrams 
into the stream that are complex to decode and cause the receiver to 
become overloaded. However, the encodings covered in this document 
do not exhibit any significant non-uniformity. 


8. Congestion Control 


The general congestion control considerations for transporting RTP 
data apply to BV16 and BV32 audio over RTP as well (see RTP [1]) and 
any applicable RTP profile like AVP [6]. BV16 and BV32 do not have 
any built-in mechanism for reducing the bandwidth. Packing more 
frames in each RTP payload can reduce the number of packets sent, and 
hence the overhead from IP/UDP/RTP headers, at the expense of 
increased delay and reduced error robustness against packet losses. 
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